skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, Yuheng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available February 19, 2026
  2. This paper studies the near-duplicate text alignment problem under the constraint of Jaccard similarity. Specifically, given a collection of long texts and a short query text, this problem finds all thesubsequencesin each text whose Jaccard similarities to the query are no smaller than a given threshold. Near-duplicate text alignment is computationally intensive. This is because there are O(n2) subsequences in a text withntokens. To remedy this issue, a few recent studies propose to first generate the min-hash sketch of every subsequence in each text and then find all the subsequences whose min-hash sketches are similar to that of the query. They introduce the concept of compact windows and show that the O(n2k) min-hashes in a text withntokens can be losslessly compressed in compact windows using O(nk) space, wherekis the sketch size. However, the space cost O(nk) is still too high for long texts, especially when the sketch sizekis large. To address this issue, we propose to use One Permutation Hashing (OPH) to generate the min-hash sketch and introduce the concept of OPH compact windows. Although the size of each sketch remains the same, which is O(k), we prove that all the O(n2k) min-hashes generated by OPH in a text withntokens can be losslessly compressed in OPH compact windows using only O(n+k) space. Note the generation of OPH compact windows does not necessitate the enumeration of the O(n2k) min-hashes. Moreover, we develop an algorithm to find all the sketches in a text similar to that of the query directly from OPH compact windows, along with three optimizations.We conduct extensive experiments on three real-world datasets. Empirical results show our proposed algorithms significantly outperformed existing methods in terms of index cost and query latency and scaled well. 
    more » « less
  3. Recent years have witnessed the superior performance of heterogeneous graph neural networks (HGNNs) in dealing with heterogeneous information networks (HINs). Nonetheless, the success of HGNNs often depends on the availability of sufficient labeled training data, which can be very expensive to obtain in real scenarios. Active learning provides an effective solution to tackle the data scarcity challenge. For the vast majority of the existing work regarding active learning on graphs, they mainly focus on homogeneous graphs, and thus fall in short or even become inapplicable on HINs. In this paper, we study the active learning problem with HGNNs and propose a novel meta-reinforced active learning framework MetRA. Previous reinforced active learning algorithms train the policy network on labeled source graphs and directly transfer the policy to the target graph without any adaptation. To better exploit the information from the target graph in the adaptation phase, we propose a novel policy transfer algorithm based on meta-Q-learning termed per-step MQL. Empirical evaluations on HINs demonstrate the effectiveness of our proposed framework. The improvement over the best baseline is up to 7% in Micro-F1. 
    more » « less
  4. ABSTRACT Characterizing the structural properties of galaxies in high-redshift protoclusters is key to our understanding of the environmental effects on galaxy evolution in the early stages of galaxy and structure formation. In this study, we assess the structural properties of 85 and 87 Hα emission-line candidates (HAEs) in the densest regions of two massive protoclusters, BOSS1244 and BOSS1542, respectively, using the Hubble Space Telescope (HST) H-band imaging data. Our results show a true pair fraction of 22 ± 5 (33 ± 6) per cent in BOSS1244 (BOSS1542), which yields a merger rate of 0.41 ± 0.09 (0.52 ± 0.04) Gyr−1 for massive HAEs with log (M*/M⊙) ≥ 10.3. This rate is 1.8 (2.8) times higher than that of the general fields at the same epoch. Our sample of HAEs exhibits half-light radii and Sérsic indices that cover a broader range than field star-forming galaxies. Additionally, about 15 per cent of the HAEs are as compact as the most massive (log (M*/M⊙) ≳ 11) spheroid-dominated population. These results suggest that the high galaxy density and cold dynamical state (i.e. velocity dispersion of <400 km s−1) are key factors that drive galaxy mergers and promote structural evolution in the two protoclusters. Our findings also indicate that both the local environment (on group scales) and the global environment play essential roles in shaping galaxy morphologies in protoclusters. This is evident in the systematic differences observed in the structural properties of galaxies between BOSS1244 and BOSS1542. 
    more » « less
  5. Both the computational costs and the accuracy of the invariant-imbedding T-matrix method escalate with increasing the truncation numberNat which the expansions of the electromagnetic fields in terms of vector spherical harmonics are truncated. Thus, it becomes important in calculation of the single-scattering optical properties to chooseNjust large enough to satisfy an appropriate convergence criterion; thisNwe call the optimal truncation number. We present a new convergence criterion that is based on the scattering phase function rather than on the scattering cross section. For a selection of homogeneous particles that have been used in previous single-scattering studies, we consider how the optimalNmay be related to the size parameter, the index of refraction, and particle shape. We investigate a functional form for this relation that generalizes previous formulae involving only size parameter, a form that shows some success in summarizing our computational results. Our results indicate clearly the sensitivity of optimal truncation number to the index of refraction, as well as the difficulty of cleanly separating this dependence from the dependence on particle shape. 
    more » « less